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1 Introduction 



The occupancy scheme is a simple urn model in probability theory that possesses a variety of 
applications to statistics, combinatorics, and computer science. These include, for instance, 
species sampling [7J [12], analysis of algorithms [9], learning theory [6], etc. The books by 
Johnson and Kotz [15] and by Kolchin et al. [18] are standard references. 

This model is often depicted as balls-in-bins. Typically, we denote by Probj the space of 
probability measures on some countable set of indices X, so each p G Probj can be identified as 
a family p = (pi : i G X) of nonnegative real numbers with ^2 ieI Pi = 1- Given some p G Probj, 
one throws balls successively and independently in a fixed series of boxes labeled by indices i 
in X, and assumes that each ball has probability pi of falling into the box i. For every integers 
j, n with j < n, we denote by N^- the number of boxes containing exactly j balls when n balls 
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have been thrown, and by 

oo 
3=1 

the total number of occupied boxes. 

We consider here a variant of this occupancy scheme which corresponds to a nested family 
of boxes. This is conveniently described in terms of the genealogical structure of populations, 
so we start by recalling some notions in this area. We introduce the infinite genealogical tree 

oo 

T := |jN fc , 

k=0 

with N := {1, 2, . . .} and the convention N° := {0}. The elements of T are called individuals, 
and for every integer k, the k-th generation of T is formed by the individuals in N fc . The 
boundary &T = N N of T is the set of infinite sequences £ = (£i,£2, • • •) of positive integers, 
which we call leaves. For each leaf £ = (£i, £ 2 , . . .) and each integer k, we write £^ = (£i, . . . , £%) 
for the ancestor of £ at generation k. Conversely, for every individual at generation k, say i G N fc , 
we denote by d% the subset of leaves whose ancestor at generation k is i. In particular, the 
root of the genealogical tree should be viewed as the progenitor of the entire population, and 
8T = dT. 

Then consider some probability measure, say P, on <9T, and imagine that we sample a 
sequence A(i), A (2), ... of i.i.d. random leaves according to the law P. For every fixed integers 

(k) 

k, n G N, we denote by iV„ the number of ancestors at generation k of the first n leaves : 

:= Card {Ag :m<n} 

= Card {i G N fc : Card (d% n {A (1) , . . . , A (n) }) > 1} . 

More precisely, we may also consider for every integer 1 < j < n 

N?] := Card {1 E N k : Card (dT t n {A (1) , . . . , A (n) }) = j} , 

the number of individuals i at generation k such that the boundary d% of the subtree that 
stems from % contains exactly j leaves among {A(i), . . . , A( n )}. The connexion with the classical 
occupancy scheme may be better understood by viewing the random leaves as balls which are 
thrown on the boundary of the tree, and then imagining that each ball falls down following the 
branch from the leaf to the root 0. Each individual i can be thought of as a box, and if balls 
are thrown randomly according to the probability measure P on the boundary of the tree, then 
the probability that some given ball passes through the box i at generation k is 

Pi {k) :=P(d%). 

Clearly, p(k) := (pi(k) : % G N k ) defines a probability measure on N fc for each generation k, and 
the sequence of discrete probability measures (p(k) : k G N) determines P. 

Recently, Gnedin et al. [TT] have considered asymptotic laws for a randomized version of 
the classical occupation scheme, where the discrete probability p G Probj is random (more 
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precisely, p is obtained from the atoms of some Poisson point measure on ]0, oo[). In the 
present work, we will be interested in a situation when the probability measure P on <9T (and 
therefore also each probability measure p(k) on N h ) is random. So henceforth, conditionally 
on P, each leaf Am, ... is picked randomly according to P and independently of the others. 
This is equivalent to assuming that the sequence of random leaves is exchangeable with de 
Finetti measure P. More precisely, we shall assume that P is given by some multiplicative 
cascade; see Liu [19] and the references therein. This means that we consider first some random 
probability measure g = (gi, . . .) in Prob^ and assign to each individual i = (ii, . . . , z/-) of the 
genealogical tree an independent copy g(i) of g. Roughly speaking, g(i) describes how the mass 
Pi(k) = P(dTi) is splitted to the subsets of leaves d%j for j G N, where ij = (z'i, . . . , ik,j) 
denotes the j-th child of the individual % at generation k + 1. Specifically, Qj(i) is the portion 
of the mass of i inherited by the child ij, i.e. 



Qj(i) 



Pijjk + 1) 
Pi(k) 

so that, by iteration, 

Pi(k) = g h (0) x Q i2 (i {1) ) x • • ■ x ft fc (i (fc-1) ) (1) 

where v- k '^ = . . . , denotes the ancestor of % at generation kl < k. Clearly, for each integer 
k, p(k) = (pi(k) : i G N fc ) now defines a random probability measure on N fc , and we can identify 
the conditional laws 

£ (JV?> | p(*)) = £ (JVJ«) and £ (jvg | p(*)) = £ (<f ) . (2) 

Our main purpose is to determine the asymptotic regimes of the numbers of occupied boxes 
N^ j and Nn when both n and k tend to infinity. It is easily seen from routine estimates that 
non-degenerate limits should occur when k w Inn. Since both k and n are integers, a natural 
regime thus could be k = [olnnj for some real number a > 0, where the notation |_-J refers to 
the integer part. It turns out that this is actually too crude, in the sense that for k — \_a\nn\, 
the asymptotic behavior of N^J does not only depend on a, but also on the oscillations of the 
fractional part {a Inn}. Indeed, we shall establish a law of large numbers for and a central 

limit theorem for when k,n — > oo in such a way that k = a\nn + b + o(l) for fixed real 
numbers a and b in certain intervals. 

Our approach essentially combines uniform probability estimates for the classical occupancy 
scheme and information about asymptotic behaviors in multiplicative cascades which can be 
gleaned from the literature and will be reviewed in Section 2. In particular, the analysis 
of multiplicative cascades relies crucially on the natural connexion with a class of branching 
random walks, and more precisely, on their large deviations behaviors whose descriptions are 
due to Biggins [5] . The main results about the asymptotic regimes in our model will presented 
and proved in Section 3. They include a law of large numbers and a central limit theorem 
mentioned above; we will also study asymptotics of the shattering generation, i.e. the lowest 
generation at which no box contains more than a fixed number of balls. Finally, we shall 
conclude this work by discussing some interpretations of the present results in the framework 
of homogeneous fragmentation processes, which provided the initial motivation for this work. 
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2 Preliminaries 



2.1 Some limit theorems for the occupancy scheme 

In this section, we lift from the literature on urn models a law of large numbers and a central 
limit theorem for the number of occupied boxes that will be useful in our study. 

Given an arbitrary discrete probability measure p = (p$ : i G X) G Probj, we define first for 
every j < n the number iV£ ■ of boxes occupied by more than j balls when n balls have been 
thrown, viz. 

n 

t=j+x 

In particular, for j — 0, N% = N%. Introduce also for every real number x > 

oo ^ 

We may now state the following law of large numbers which has its roots in Bahadur pQ. 



Lemma 1 Let p(l), p(2), . . . be a sequence of discrete probability measures, (nk,k G N) a 
sequence of positive integers with linn^oo n& = oo ; and j G Z + . Suppose that 

— < oo (3) 

$ M 

and 

p,f k \an k ) 
hm hm ... = 1 . 



Then 

lim 



lim — — = 1 a.s. 



Although this result should belong to the folklore of limit theorems for urn models, we have not 
been able to find a precise reference where it is stated in this form, and thus we shall provide 
a proof. The argument relies on Poissonization, which is an important technique in this area; 
see, for instance, the surveys by Gnedin et al. [10] or Hoist [13]. 

Proof: We work first with a fixed probability measure p = (pi : j 6 I), but we replace the 
deterministic number of balls n by n x , where n = (n x ,x > 0) is an independent standard 
Poisson process. The key effect of Poissonization is that now, for each i G Z, the number of 
balls in the box i has the Poisson distribution with parameter p^x, and that to different boxes 
correspond independent Poisson variables. As a consequence, the variable fa which takes the 
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value 1 if more than j balls occupy the box % and otherwise, has the Bernoulli distribution 
with parameter 

1=3+1 ^ 

and when i varies in X, these Bernoulli variables are independent. Changing the typography, 
we write 

for the number of boxes occupied by more than j balls when n x balls have been thrown. By 
elementary properties of sums of independent Bernoulli variables, we see that 

E K.) = tf(x) (4) 

and 

Var < /,'/(,•) . 

Thus Chebyshev's inequality ensures that 



N p • 

x,3 



> I ' " 



for every e > 0. 

Next, we replace the fixed probability measure p by p(k) and take x = arik for some real 
number a close to 1. The bound above combined our assumptions enables us to apply the 
Borel-Cantelli lemma, and we get that 

jjP(fe) 

Jim -^yT^T = '(«) a - s - > (5) 

with 

*(a) "= lim ^ k){ank) 

Recall that the number = n Qnfc of balls which are thrown has the Poisson distribution 
with parameter arik- On the event {n Q „ fc > n^}, there is the bound 

]SP( fe ) > ATP( k ) 

an k ,j — n k ,j i 

whereas on the complementary event we have 

jjpO) <- ?yp( fe ) 

an k ,j — n k ,j • 

Recall that n k — > oo. Plainly, if a > 1, then 



lim P(n anfe > n fc for all k > k ) — 1 

feo— »oo 
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whereas if a < 1, then 



lim F(n ank < for all k > k ) = 1. 

fc(j^OO 



This completes the proof, by using ([5]) and the assumption that lim a ^i£(a) = 1. □ 

Remark. If we replace the requirement ([3]) in Lemma [1] by the weaker lim^oo (rife) = oo, 
the same calculations yield the weak law of large numbers : 

lim — 77T^ — = 1 in probability. 

jif\n k ) 

Next, we turn our attention to fluctuations for the number of occupied boxes. Following 
Hwang and Janson [H] , we introduce for every fixed probability measure p = (pi : i G X) G 
Probj and every x > 

Aip(s) := A$(s) = J> " e"**) (6) 
«ex 

and 



tex V «ex 
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a 2 p ( x ) := >J e- pia: (l - e~ PiX ) - x" 1 ( >J x^e"^ ] . (7) 



These quantities provide uniform estimates for the mean and the variance of N%; specifically 
it is known from Theorem 2.3 in [13] that 

|E(i\£)-/i p (n)|<c (8) 

and 

|Var(A£)-^(n)|<c, (9) 

where c denotes some numerical constant (which depends neither of n nor of p). This makes 
the following central limit theorem quite intuitive (see Corollary 2.5 in [14] , and also Dutko [8] 
and Karlin [16] for earlier versions). 

Lemma 2 Let p(l),p(2), ... be a sequence of discrete probability measures and (k n ,n G N) a 
sequence of positive integers such that 

lim a l(k n )( n ) = 00 • 

n— >oo *^ ' 

Then the number of occupied boxes is asymptotically normally distributed when n goes to infinity, 
in the sense that 

ff p(fcn)( n ) 

converges in distribution to a standard normal variable as n — > oo. 

We do not know whether a similar central limit theorem holds for the number iVT • of boxes 
occupied by exactly j balls for j > 1. 
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2.2 Large deviations behaviors of multiplicative cascades 



Recall that g is a random probability measure on N. We denote its law by v, so v is a probability 
measure on Prob^ that will be referred to as the splitting law. We shall always assume that 
this splitting law is not geometric Q, in the sense that there is no real number r > such that 
with probability one, all the atoms of g belong to {r n ,n E Z + }. In particular, note that the 
degenerate case when g is a Dirac point mass a.s. is henceforth excluded. 

As it was explained in the Introduction, we consider a family (g(i),i E T) of independent 
copies of g labeled by the individuals of the genealogical tree T . The multiplicative cascade 
construction §Qj defines a random probability measure p(k) on N fc for every generation k EN. 
Our aim is to apply general asymptotic results for occupancy schemes such as Lemmas [T] and [21 
and in this direction, we shall use fundamental large deviations behaviors for branching random 
walks that Biggins [5] established. 

Taking logarithm of masses, we may encode the random probability measure p(k) = (pi(k) : 
i E N k ) at generation k by the random point measure on M + 

Z^(dy) :=^_ lnPi(fc) (dy), 

where 5 Z stands for the Dirac point mass at z and the sum in the right-hand side is taken over 
the individuals i at the k-th generation which have a positive mass. It then follows immediately 
from the structure of multiplicative cascade ([T]) that (Z^ k \ k E Z + ) is a branching random walk, 
in the sense that for every integers k, k' > 0, Z^ k+k ' is obtained from Z^ by replacing each 
atom z of Z^ by a family {z + y, y E y}, where y is distributed as the family of the atoms of 
Z^ k ) and distinct atoms z of Z^ k > correspond to independent copies of y. 

We now introduce analytic quantities defined in terms of the splitting law v which will have 
an important role in the present study. First, we define the Laplace transform of the intensity 
measure of Z^> by 

l(0) :=E((Z (1) ,e^» 
for > 0; note that there are also the alternative expressions 

l(9) =v[J2 e 9 s ) = / [zZA ^( d P) • ( 10 ) 

VjeN / -'Probf, y igN J 

The function L :]0, oof— >]0, oo] is convex decreasing with l(1) = 1; we define 

6* := mi{6 > : l(0) < oo} , (11) 

so that l(0) < oo when > 0*. One readily sees from Holder's inequality that In L is a convex 
function, and then that 

^(0):=lnL(0)-0^|| (12) 

^^Working with a geometric splitting law would induce a phenomenon of periodicity which we shall not 
discuss here for simplicity. However results similar to those proven in this work can be established by the same 
techniques for geometric splitting laws. 
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is a function which decreases on ]9*, oof. 

As l(1) = 1 and L decreases, we have <p(l) = — l'(1) > 0, and thus the set of 9 £]#*, oof such 
that (f(9) > is a non-empty open interval #*[, where 0* > 1 is defined by 

6* := sup{0 > ^ : <p(9) > 0} . (13) 

Remark : The critical parameter 9* may be finite or infinite, and is finite whenever 

|| max ft = 1 , 

ieN 

where g = (ft, i G N) denotes a random probability measure on N with law v. Indeed, it is 
easily seen that 

lim L(#) 1/,e = || max ft ||oo , 

and when the right-hand side equals 1, the function g : 9 — ► ^ has thus limit at infinity. 

Since g is non-negative on [1, oof and g(l) = 0, g reaches its overall maximun at some location 
at, say, 9 max g]1, oof. As g'{9) = 9~ 2 {p(9), we conclude that ^ max = 9* < oo. 

Following Biggins [I], we are now able to introduce for every 9 > 6* 

W {k \9) := h(9)- k (Z {k \ e- 9 -} = h(9)- k Pi( k f > k > - 

i£N fc 

which form a remarkable family of martingales : 

Lemma 3 For every 9 e)9*,9*l the martingale (W {k) (9),k G Z+) bounded in L 7 (P) /or 
some 7 > 1. /is terminal value 

W(9) := lim jy (fe) (#) 

(strictly) positive a.s. 

Proof: Jensen's inequality implies that for every probability measure p G Prob^ and every 
7 > 1, there is the upper-bound 



Erf 



r 7(0-l)+l 



For any 9 > 9*, we may chose 7 > 1 sufficiently small such that j(9 — 1) + 1 > 9*, and we 
deduce that E(W^{9)' y ) < 00. 

We then observe that the function / : 9 — > lnL(#) has derivative /'(#) = —9~ 2 ip(9). Thus 
this derivative is negative when # #*[, which means that / decreases in some neighborhood 
of 9. We may thus find 7 > 1 sufficiently small such that 

mL( 7 0) lnL(0) 
7# < ' 

and hence l(7#) < l(#) 7 . We can now apply Theorem 1 in Biggins [5J, which completes the 
proof of the first part of our claim. Finally, the assertion that the terminal value W (9) > a.s. 
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derives easily from the fact the probability that branching random walk is extinguished at 
generation k equals for every fceN. □ 



In order to state the key technical result for this present study, we define for every 9 > 9* 
the tilted random point measure 

p-Sy 

zi k \dy):= l -y k ZW(dy), y>0. 
We also introduce the mean and the variance of the intensity measure of zf^ : 



M (") : =-IW a,ld v(e) = W-(iw) ' (14) 

Both m(0) and v(0) are positive quantities, and write ge for the (centered) Gaussian density 
with variance v(0), i.e. 

The next statement is a version of Theorem 4 in Biggins [5] specialized to our framework. 
Lemma 4 The following assertion holds with probability one: 



lim 

k— >oo 







VkZ$°\[x + ku{9) - M + fcM(0) + h[) - 2hW(e)g e (x/Vk) 
where the limit is uniform for x G R ; /t < 1 and 6 in a compact subset of]9*, 0*[. 
Next, observe that if / : M — > 1R+ is, say, a continuous function, then 
V/^M^+lnp^)) = / f(kM(d)-y)Z^(dy) 

V(0)e eM ^) fe / /(jfeM^-^e^^^Cdy). 



Recall also that the rate function </? is defined by f)12p . We finally state the following limit 
theorem which will be useful to estimate the conditional mean number of occupied boxes given 
the multiplicative cascade. 

Corollary 1 (large deviations behavior) Pick 9 g]0*,0*[ and let f : R — > R + be a continuous 
function. Assume that there exist a > and (3 > 9 such that 

lim y a f(y) = and lim e~^ y f(y) = 0, 

so in particular f G L 1 (e~ 6y dy) . Let also (c^ : k G N) denote a sequence of real numbers which 
converges to some cel. Then with probability one, we have 

lim v^e-^* V f(ku(9) + \n Pl (k) + c k ) = Jl_ ( [ f(y)e- 0y dy] W{9) . 
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Corollary [TJ follows from readily from Lemma H] when / has bounded support. However, the 
derivation in the case when the function / has unbounded support is rather technical, even 
though our assumptions have been tailored for the purpose of the present work. We postpone 
the proof to the Appendix. 



3 Asymptotic regimes 
3.1 Main results 

We now have all the technical ingredients for our study, we just need to introduce a few more 
notation. We shall consider the regime for pairs of integers (k, n) such that 

k, n ^ oo and k — a Inn — > b , (15) 

where a > and b G M are fixed. When F(k, n) is some function depending on k and n, we 
shall write 

lim F(k, n) 

a,b 

for the limit of F(k, n) when (k, n) follows the regime fl!5p . of course provided that such a limit 
exists. 

Recall that our basic datum is the splitting law v on ProbN, and that its Laplace transform 
is given by ( TTUl) . Further important notions include the critical parameters 9*, 9*, the rate 
function (p, and the mean M and variance V functions, which have been defined in ( Hip . (113j) . 
( I12p and flHj) , respectively. The mean function M decreases continuously on and takes 

positive values. We denote the inverse bijection by 

NT 1 :]M*,M*H0*,0*[, 

where 

M» = lim m(9) and M* = lim m(9) . 
One always has ]m*, M*[C]0, oo[, and the inclusion can be strict. 

Example : These quantities are especially simple in the case when v is the Poisson-Dirichlet 
distribution PD(1). Indeed, one easily gets l(9) = 1/9 for 9 > 0, and then ip{9) = — In 9 + 1. 
One thus sees that 9* = and 9* = e. Finally m(#) = 1/9 and v(9) = 1/9 2 , so M* = 1/e, 
M* = oo and M _1 (a) = 1/a. 

Finally recall from Lemma [3] that for every 9 W(9) is a positive random variable 

which arises as the limit of a remarkable martingale. We are now able to specify regimes for 
the (strong) law of large numbers for the number of boxes occupied by exactly j balls in an 
occupation scheme driven by a multiplicative cascade. 

Theorem 1 Pick a 6]1/m*, 1/m*[ and b G K., and set 9 = M~ 1 (l/a). Then for every integer 
j > 9, the following limits 

lim Vke-^ k , = ( V W{9) , 
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lim Vke~^ k N { n k ] = T{j ~ 6)e ' 9b/ a W(6) 



and 

».6 J j\y/27CV(6) 

hold with probability one. 

Remark : It may be interesting to recall that is also the order of magnitude of the 

numbers of boxes of size approximately e~ k / a m \/n at generation k; see Corollary 3 in [3]. We 
further stress that (p(9) < 1/a and that this inequality is strict except when = 1 (this can be 
checked directly from the observation that In L is a strictly convex function). 

Proof: We work conditionally on the random probabilities p{k) using (j2J). We aim at applying 
Lemma (TJ and in this direction we fix some real number a close to 1 and observe that 



Mk)an f, M M (Pi(k)any 1 
U - 1)! 



22 f(k/a + In pi(k) + c k 



with 



and 



f(x)=[l-[l + ~. + jj- _)«p{-e?} 



Cfc := Inn + In a — fc/a = — b/a + lna + o(l) . 



We can now check that the assumptions of Corollary [T] hold with c = — 6/a + In a. A 
straightforward calculation shows that 



\ /(y)e-My = V 
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and then, invoking Corollary [T] and recalling that 9 = M 1 (1/a), we deduce 

t°° r(P — f)\\ o 0(-b/a+lna) 
z^)-3sm w{9) - (16) 

Next, we fix rj 6 [—1, 1] and define for every integer k 

n kjV := [exp ((k - b - n)/a)\ . 
Replacing b by b + rj and n by n^ v in ffl6l) . we get 



n 



Note that the right-hand side depends continuously on the variable a and that the requirement 
([3]) in Lemma [I] is fulfilled, as <f(9) > 0. An application of this lemma gives 

with probability one. An argument of monotonicity, namely 

n k>v < n < n Kn , =► N^_ x < N^_ x < , 

completes the proof of our first claim. The second follows immediately from the first, since 
/V (fc ) = y\T (fc ) , - /V w □ 

n,j n,j — l n,J ' 

We next turn our attention to finer asymptotic results for the total number of occupied boxes 
N^fl = Nn k \ If 9 < 1, that is if a < 1/m(1), then we can take j = 1 in Theorem [TJ and we get 
from the easy identity 

1=1 ' ^° 

that 

\imVke-^ k N^ = T ( 1 ~J!)l Z^W(6) a.s. (17) 

Recall also from (jSJ) that the conditional expectation of number of occupied boxes given the 
de Finetti measure, E(Ni k) | p(fc)), can be approximated by /j, p ^(n) = ^ k \n), and that fTToD 
provides an estimation of the latter. This gives 

'g^'^W' e^l W w(9y (18) 

Recall further ([Tj) and (Q and consider the following approximation for the conditional variance 



2 



-Pi(k)n 

iei \ -iei 



Theorem 2 Notation is the same as in Theorem^ Provided that 9 < 1, we have that 



,. a l(k)( n ) O n 
hm . . =2—1 a.s. 



a ' fe / i p(fe)( n J 

a consequence, when (k,n) follows the regime (TT5]) . 

A"n fe) - A* P (fc)(n) 



V(2 e -l)/x p(fc )(n) 
converges in distribution as to a standard normal variable. 
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It is interesting to note that the parameter b plays no role in the limits above, so by a standard 
argument based on extraction of sub-sequences, the results still hold under the weaker require- 
ment that k, n — > oo such that k = a Inn + 0(1). We do not know whether this can be extended 
to the more general regime when one merely requires that k ~ a Inn. We also underline that 
the usefulness of Theorem [2] is limited by the fact that the centralizing term n P (k) (n) in the 
numerator is random, and only its first order asymptotic (fl8l) is known. 

Proof: The calculations resemble that in the proof of Theorem [TJ We start by observing that 



^2 fi k / a + hipi(fc) + c fe ) - (n) 1 I ^ 9{k/a + Inpi(k) + c k ) J 



with 

f(x) = exp{-e x } (1 - exp{-e*}) , g(x) = e x exp{-e x } 

and 

Cfc := Inn — k/a = —b/a + o(l) . 

Easy calculations show that 

(2 e -l)r(l-#) 



f(y)e- dy dy 



and 



/ g(y)e- e My = T(l - 6) . 
Applying Corollary [1] and recalling that 6 = M _1 (l/a), we deduce that almost surely 

lim Vke~^ h al (k) (n) = (2 e - rf'^^z JH W (9) 

= (2 e -l)limv / fce^V p(fc) (n), 

a,b 

which is our first claim. The second then derives from Lemma [2j □ 

Roughly speaking, the estimation ( TlTI) means that for a < 1/m(1), 

n x e k/a =^ NW x e^/Vk. 

We shall point out that this is no longer true when a > 1/m(1), in other words that a phase 
transition occurs at the critical value a = 1/m(1). In this direction, we consider the more 
general regime for pairs of integers (k, n) such that 

k,n —>■ oo and k ~ a Inn, (19) 

for some fixed a > 0. Again, when F(k, n) is some function depending on k and n, we shall 
write 

lim F(k, n) 

a 

for the limit (whenever it exists) of F(k,n) when (k, n) follows the regime (|19[) . 
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Proposition 1 7/1/m(1) < a, then 

lim = 1 a.s. 

a n 

Proof: By an argument similar to that in the proof of Theorem [U it suffices to check that 

lim(cm) /Xp(fc)(cm) = 1 a.s. 

a,b 

provided that a is sufficiently close to 1. The upperbound /i p (A;)(cm) < an is plain, so we shall 
focus on the lowerbound. 

In this direction, integrating the obvious inequality (1 + e)x e > 1 — e~ x for x > and 
< e < 1 arbitrary, we see that 

1 - e~ x > x - x l+£ . (20) 
Replacing x by the random variable anpi(k) and summing over i e N fc , we deduce that 

/i P (fc)(«n) >an- (an) 1+e ^ p;(A;) 1+£ = an - (an) 1+£ L(l + e) fc ^ (fc) (l + e) . 

As W^(l + e) is a positive martingale (in the variable fc), we have sup fe + e) < oo a.s. 

On the other hand, we have 

In (n 1+e L(l + £) k ) = (l+£)lnn + HnL(l + e) 

= (l+e + alnL(l + e) + o(l))lnn. 

Recall that — M is the derivative of InL and that lnL(l) = . Since aM(l) > 1, we can chose 
£ > small enough so that alnL(l + e) < —e. Then 

(an) 1+£ L(l+£) k = o(n), 

and we conclude that 

liminf(an) _1 /i p (fc)(an) > 1 a.s. 

a, b 

which completes the proof. □ 

For 9* < 2, a related argument also enables us to estimate the lowest generation at which 
all n balls fall into different boxes. More generally, recall that for every integer j, N n - denotes 
the number of boxes at generation k which contain more than j balls when n balls have been 
thrown. Note that this quantity increases with the number of balls n and decreases with the 
generation k. Define 

Cnj ■= n,in{/,- G N : N$ = 0} . 
Proposition 2 If 0* < j + 1, then we have 

Cn ■ 

lim — — = 1/m* a.s. 
rwoo In n 
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Proof: We first consider the randomized version with a total number of balls n x which has the 
Poisson law with parameter x. We write N^j for the number of boxes at generation k occupied 
by more than j balls, and consider its conditional expectation given the random probability 
measure p{k), which has been computed in (J3J). Using the assumption that 9* < j + 1 at the 
second line below, we get 

E(»S|PW)=A PWJ W = Wl-eWl + 

< c(6*) (m(k)f , 

«eN fe 



where 



c(0*) = m&xy-"' ( 1 - e~ y ( 1 + • • • + ^ 



y>0 \ \ j\ 

is some finite constant. Observe that the preceding upperbound can be expressed as 

c(6*)x e \(9*) k W {k \9*) , 

and recall that W {k) {9*) is a martingale. The unconditional expectation K(W X ■) can thus be 
bounded from above by c(9*)x e *h(9*) k . 

We next pick a > 1/m* and take x = e k ^ a . Recall that 

KT^ = M ( r ) = M * • 



We thus have 



In (x e \(9*) k ) = k9* (~-M*J 



and as a consequence 

53E(S<J!> )<oo. (21) 



fceN 



Then chose any a' > a and recall that the Poisson process fulfills 

lim P (n ck/a > Le( fc+1 )/ a 'j for all k > k ) = 1 . 

We deduce from (l2Tj) and an argument of monotonicity that 

^ (^[e ( ' )fe+1)/a 'J j = ® ^ or a ^ ^ e S ers k sufficiently large j 
Observing that for every integer n and k = [a! In nj : 

( n • > k =^ N$ > 1 => JVf fc L lW , . . > 1 , 

and therefore 

hmsup - — - < a a.s. 

n — >oo m 
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As a' can be chosen arbitrarily close to 1/m*, we conclude that 

lim sup y^i- < 1 /m* a.s. 

n — >oo An Tl 

Finally, the converse bound 

liminf — > 1/m* a.s. 
n^oo In n 

derives easily from Theorem [TJ □ 

3.2 Interpretation in terms of homogeneous fragmentations 

The initial motivation for this work was to gain insight on certain asymptotic regimes for 
homogeneous fragmentations. Roughly, the latter form coherent families of natural Markov 
processes with values in the space of partitions of finite sets, such that these random partitions 
are refining as time passes. They are closely related to multiplicative cascades of random 
probability measures and to the occupancy scheme; we start by giving precise definitions. 

A non-empty set B of positive integers is called a block, and a partition of B is a denumerable 
family 7r = {tti, 7r 2 , . . .} of pairwise disjoint sub-blocks of B such that U7Tj = B. We write Parts 
for the set of partitions of B and endow Parts with a natural partial order : one says that a 
partition 7r is finer than another partition 7r' and then write ir -< n' if and only if each block 
7Tj of 7r is contained into some block ir 1 - of n'. A sequence (ir(k), k > 0) of partitions is called 
nested (or, sometimes also, refining) if n(k + 1) is finer than ir(k) for every integer k > 0. 

The occupancy scheme produces naturally random partitions. Typically, we consider some 
discrete probability measure p G Probj and the corresponding family of boxes, and we label 
balls by integers. Every block B can then be splitted into sub-blocks that correspond to the 
labels of the balls which occupy the same box. This provides a random partition of B, say 
7r B . The latter is exchangeable, in the sense that its distribution is invariant under the natural 
action of permutations of B. Note that when B is infinite, in particular when B = N, has no 
singletons a.s. A fundamental theorem due to Kingman [T7j (see Theorem 2.1 in [2]) claims that 
any exchangeable random partition of N which has no singletons a.s. has the same distribution 
as the partition that results from some randomized version of the occupancy scheme, i.e. for 
which p is now a random discrete probability measure. The assumption of absence of singletons 
can be dropped provided that one allows p to be defective, i.e. to be only a sub-probability 
measure. 

We now consider again a sequence (p(k),k G N) of random discrete probability measures 
which is associated to some multiplicative cascade as in Qj, and for every k G N, we denote by 
Il(fc) the random partition of N induced as above by the occupancy scheme at generation k. 
Then II = (H(k), k > 0) is a nested sequence of exchangeable random partitions of N, which is 
Markovian. We call II a homogeneous fragmentation chain. Its transition probabilities inherit 
the branching property from the multiplicative structure of the cascade; see Propositions 1.2 
and 1.3 in 0. 

For any of blocks B and B' with B' C B, the restriction to B' yields a natural projection 
7r — > ir\B' from Part^ to Parts'- The partial order -< is clearly compatible with restrictions, in 
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the sense that 7r ^ 7r' =>- tt\b> d T Thus, if (tt(A;), > 0) is a nested sequence of partitions 
of a block B and if 5' C B is a smaller block, then the sequence (ir\B'{k), k > 0) of partitions 
restricted to B' is again nested. A simple but nonetheless important fact is that the Markov 
property of a homogeneous fragmentation chain II is preserved by restriction, in the sense that 
for every block BCff, the nested sequence of partitions of B, H\ B = (JL\B{k), k > 0), is still 
Markovian; see Lemma 3.4 in [2] for a sharper statement. 

Recapitulating, the occupancy scheme enables us to associate to any random multiplica- 
tive cascade of discrete probability measures a homogeneous fragmentation chain II. In turn, 
the latter provided a nested sequence of random partitions of an arbitrary block B C N, 
IL\b = (II|b(&), k > 0), which is Markovian. Further, these Markov chains are coherent, in the 
sense that if B' C B, then II|b' = (H.\b)\b>- Roughly speaking, the statements in the preceding 
Section provide information about the asymptotic regimes for a homogeneous fragmentation 
of a finite set, when both the size n of that set and the time k at which the fragmentation 
process is observed tend to infinity. For example, Theorem [T] is a limit theorem for the number 
of components with a fixed size (like singletons, pairs, etc.), whereas Proposition [2] specifies the 
asymptotic behavior of the shattering time, that is the first instant at which the fragmentation 
process of a finite block reaches its absorbing state, i.e. the partition of that block into sin- 
gletons. Finally, we also mention that, for the sake of simplicity, we have only discussed here 
fragmentation processes in discrete time; however our results can be shifted to homogeneous 
fragmentations in continuous time, using discretization techniques similar to those in developed 
in 0. 

Appendix : Proof of the large deviations behavior 

We finally proceed to the proof of Corollary [TJ When / is continuous with compact support, 
the claim follows from Lemma H] by approximating / with step functions. See e.g. Corollary 4 
of [5] and Theorem 3 of Stone |20j for slightly stronger statements in terms of directly Riemann 
integrable functions with compact support. All that is needed to extend this to continuous 
functions with unbounded support (which fulfill the conditions of the statement) is to establish 
the following : If we define for some fixed a > and j3 > 9 

g+(x) = l{ x>0 }X a , and g-(x) = l{ x<o} e 0x , 

then, 

sup Vke~ vWk V g±(kM(6) + \n Pi (k)) < oo a.s. (22) 

i£N fe 

Indeed, for every function / that fulfills the hypotheses of the statement and for every integer 
t > 1, we can find a continuous function f e with compact support such that fi<f<fe + 
£~ 1 (g+ + g~), and then (1221) enables us to conclude the proof by a standard argument. 

Proof of (1221) for g + : We write 

y2g + (kM(6) + \n Pi (k))= [ g + {kM{e)-y)Z^\dy)= [ (ku(9) - y) a Z«°\dy) . 

ieN k J[o,kM(e)] 

Then we pick 9' e]9, 9*[ sufficiently close to 9 (as this will be explained in the sequel) and split 
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the last integral at kM(9') to get 

I (ku(9) - y) a Z^ k \dy) + [ (ku(9) - y) a Z^(dy) . 

J{O,kM(0>)] J]kM(0'),kM(O)} 

For the first integral, there is the obvious bound 

(JfeM(0) -y) a Z (k \dy) < (kM(9)) a Z (k) ([0,kM(9')]). 

[O,fcM(0')] 

Observe from Markov inequality that Z^ k \[0, a]) < e 9a L(9) k W ( - k \9) for every a > 0, so the 
preceding quantity can be bounded from above by 

(kM(e)) a e 9kM{e \(e) k w {k \e). 

Recall that ip(6) = \nh(9) + 9m{9) and that m(0') < m(9). As a consequence 

(kM(9)) a e ekM ^L(9) k = o(e k ^)/Vk , k -> oo , 
and since the martingale W^ k \9) remains bounded a.s., we conclude that 

lim Vke'^ [ (kM(6) -y) a Z (k \dy) = a.s. (23) 

k ^°° J[0,kM(0')] 

For the second integral, we start from the bound 

/ (kM(9) -y) a Z^ k \dy) 

J]kM(e'),kM(e)] 

< J2 rZ^ k \[kM(9)-j,kM(9)-j + l]). 

O<j<fc(M(0)-M(0')) 

For indices < j < k(wi(9) - m(0')), we define 

9 kJ = M- 1 (u(9)-j/k)e[9,9'\, (24) 

so that 

kM(9 kJ ) = kM(9) - j . 

We observe that 

-kM(6)-j+l 
'kM(6)-j 



Z<*>([*m(6>) - j, ku(9) - j + 1]) = L(9 k ,) k / e e ^zfl \{dy) 

JkM(d)-j 

< e 9k ^ kMW - J+1) L(9 k A k Z ( k \([kM(9) - j, kM(9) -j + 1]) 

< {e^ M ^L(0 k ,j)) k zfl{[kM{9 Kj l ku(9 k ,) + 1]) , 
where for the last inequality, we use the fact that 9 k j > 9. 
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Recall that InL is convex and has derivative — M. Since 9 < 9 k j < 9', we have 

inL(^) <\nL(e)-M(e')(e ktj -e), 

and since if (9) = \nh(9) + 9m(9), this yields 

9 kJ M(9) + \nh(9 kjj ) < tp(9) + (9 k>j - 9)(m(9) - m(9')). 

As the mean function M is locally a regular diffeomorphism, we see from (12^1) that there is some 
finite constant C (which is independent of k and j) such that 9 k j — 9< Cj/k. Further, we also 
have that m(9) — m(9') < 9/2C provided that we choose 9' sufficiently close to 9. Then 

(e^- M WL(^,)) fc <e^ 2 exp(^(0)), 

and thus 

£ f e ^ e (e'^L^))* = O (exp(M*))) ■ 

O<j<fc(M(0)-M(0')) 

On the other hand, we know from Lemma H] that there exists an a.s. finite random variable 
£ such that 

Z { e l([kM(9 kj ),ku(9 kj ) + l])<k- 1 ^. 
Putting the pieces together, we conclude that 

sup Vke-WV [ {kM(9) - y) a Z^(dy) < oo a.s. 

feeN J]kM(e'),kM(0)] 

Combining with (|23|) . we have thus checked that (|22|) does hold for g + . □ 

The proof of the bound fT2"2"j) for g_ follows a similar route; however it may be useful to spell 
out the main steps. 

Proof of (T52D f° r 9-'- We start with 

9-(kM(9) + \n Pi (k)) = f e-P y Z (k \kM(9) + dy). 

We pick 9' e]0*, 9[ sufficiently close to 9 and split this integral at k(M(9') — m(9)). 
For indices < j < k(M(9') — m(9)), we introduce 

9 k>j = M- l (M(9)+j/k)e[9',9], (25) 

so that 

kM(9 kJ ) = ku{9) + j . 

We observe that 

f e - pv Z^(ku(9)+dy) 
= L(9 kj ) k [ e -Py e ^M { e )+ y) z (k) (fcM(0) + dy) 

< h(9 k ,) k e~ j ^- e ^ ) e ke ^ M{e) zf ) {[kM(9) + j, ku(6) +j + 1]) 

' k,j 

< e -i<fi-*) (e 9 ^L(0 kj )) k zW([kM(9 k>j ), ku(9 k>j ) + 1]) , 

19 



where for the last inequality, we use the fact that 9 k j < 9. 

Because In L is convex, has derivative — M and 9' < 9 k j < 9, there is the inequality 

lnL(0 fcj -) <lnh(9) + M(9')(9-9 k , j ), 

which yields 

9 kJ M{9) + \nL(9 kJ ) < cp{9) + (9- 9 k , j ){M{9') - m{9)). 

We see from (1251) that there is some finite constant C (which is independent of k and j) such 
that 9-9 k j < Cj/k, and that m(0')-m(0) < {(3 — 9)/2C provided that we choose 9' sufficiently 
close to 9. Then 

(e^ M ^L(M) fc < e^- e ^ 2 exp(k V (9)) , 

and thus 

J2 e-M-V (e e ** u Vh(e kd )y = O (exp(k<p(9))) . 

O<j<fc(M(0')-M(0)) 

Further Lemma H] ensures the existence of an a.s. finite random variable £ such that 

z£\([kM{9 k>j ),kM{9 k>j ) + 1]) < k- 1 ^, 
and then we can conclude that 

supVfaT*^ / e~P y Z^ k XkM(9) + dy) < oo a.s. (26) 

fceN J[O,fc(M(6»')-M(0))[ 

For the remaining integral, we note that 

j e-P y Z( k \ku{9)+dy) 

J[k{M(6')-M(6)),oo[ 

< e-^W-MW) / e- ey Z^\ku(9) + dy) 

J[k(M(e')-M(0)),oo[ 

We readily deduce that 

lim Vke~ k ^ e) [ e-^ y Z { - k) (kM(9) + dy) = a.s., 

fc ^°° J[k(M(0')-M(0)),oo[ 

and combining with (1261) . this establishes that (|22|) holds for g>_. □ 
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